Skip to content

Conversation

@ranj063
Copy link
Collaborator

@ranj063 ranj063 commented Sep 21, 2021

This reverts commit d8a19a4 to re-enable trace filtering.

And introduces the reset op in DMA ops to perform actions that must be preceded by DMA stop on the host-side.

This along with Linux PR thesofproject/linux#3167 fixes #4479 and #4558

@ranj063 ranj063 changed the title [TEST ONLY] [DO NOT REVIEW] Revert "Revert "trace: Kconfig: disable filtering by default"" Re-enable trace filtering and implement DMA reset op Sep 22, 2021
@ranj063 ranj063 force-pushed the fix/trace_filtering branch 2 times, most recently from 4bc42bd to 2a638df Compare September 22, 2021 03:13
@ranj063 ranj063 force-pushed the fix/trace_filtering branch from d63208d to 239b79b Compare September 22, 2021 07:37
@lgirdwood
Copy link
Member

@XiaoyunWu6666 can you give this PR and kernel PR 3617 above some stress test validation on CML platforms. Thanks !

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, lets see what the stress testing brings. Fingers crossed !

@ranj063 ranj063 force-pushed the fix/trace_filtering branch 2 times, most recently from eb3f4e4 to 6c32ab6 Compare September 22, 2021 15:43
@fredoh9
Copy link
Contributor

fredoh9 commented Sep 22, 2021

SOFCI TEST

@XiaoyunWu6666
Copy link
Contributor

XiaoyunWu6666 commented Sep 23, 2021

@XiaoyunWu6666 can you give this PR and kernel PR 3617 above some stress test validation on CML platforms. Thanks !

Yes,I will run daily test for this recipe and provide feedback ASAP.
But is it a typo ? I could not find kernel PR 3617 @lgirdwood

Copy link
Contributor

@keyonjie keyonjie Sep 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need some timeout check here to wait for the flushing finish?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keyonjie I checked this in the cavs fw and they do not have a wait before checking for GBUSY

@lgirdwood
Copy link
Member

@XiaoyunWu6666 can you give this PR and kernel PR 3617 above some stress test validation on CML platforms. Thanks !

Yes,I will run daily test for this recipe and provide feedback ASAP.
But is it a typo ? I could not find kernel PR 3617 @lgirdwood

Sorry, my bad - I mean thesofproject/linux#3167

revert the revert just for testing.

This reverts commit d8a19a4.

Signed-off-by: Ranjani Sridharan <[email protected]>
Add a new reset op in struct dma_ops that can be used
to perform actions after the DMA is stopped during
host_reset() and dai_reset().

Signed-off-by: Ranjani Sridharan <[email protected]>
Implement the reset op for HDA DMA. The recommended
HW programming sequence for HDA DMA calls for
keeping the DGCS_GEN bit set to 1 when the host
DMA is stopped by the host. It should be cleared
during the reset op and after resetting, the GBUSY
bit must be verfied to ensure that the DMA is idle.

Also, because we do not clear the GEN bit until dma_reset,
skip checking for the status during start to avoid issues
with restarting the DMA during pause release.

Signed-off-by: Ranjani Sridharan <[email protected]>
Trace DMA needs to be reset after it has been stopped.
Since there is no IPC for freeing trace, use the CTX_SAVE
IPC to ensure that the trace DMA is reset and freed
during suspend.

Signed-off-by: Ranjani Sridharan <[email protected]>
@ranj063 ranj063 force-pushed the fix/trace_filtering branch from 6c32ab6 to 88795bd Compare September 23, 2021 15:04
@ranj063
Copy link
Collaborator Author

ranj063 commented Sep 23, 2021

@kkarask could you please check why the mergbuild is shows as failed but there are no failures shown?

@marc-hb
Copy link
Collaborator

marc-hb commented Sep 23, 2021

@ranj063 the Kconfig revert has already been merged in commit d8a19a4. While not needed a rebase would clear that up.

@ranj063
Copy link
Collaborator Author

ranj063 commented Sep 23, 2021

@ranj063 the Kconfig revert has already been merged in commit d8a19a4. While not needed a rebase would clear that up.

but @marc-hb this reverts the revert. You cant reproduce the original problem without it

Copy link
Collaborator

@marc-hb marc-hb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't merge this before that DMA test workaround has been turned off:
thesofproject/sof-test#770

@kkarask could you please check why the mergbuild is shows as failed but there are no failures shown?

Which one? https://sof-ci.01.org/sof-pr-viewer/#/build/PR4793/build7359329 is all green.

Reporting failures without a link is not useful because the links change all the time.

@marc-hb
Copy link
Collaborator

marc-hb commented Sep 23, 2021

Does this require thesofproject/linux#3167 ? If yes, what about backward compatibility with older kernels?

https://sof-ci.01.org/sofpr/PR4793/build10449/devicetest/?model=ADLP_RVP_SDW&testcase=multiple-pause-resume-5 fails with one input/output error and a gazillion of underruns

[     6233964.752284] (      167353.437500) c0 hda-dma            ..../intel/hda/hda-dma.c:882  ERROR hda_dma_link_check_xrun(): underrun detected
[     6407965.526620] (      174000.781250) c0 hda-dma            ..../intel/hda/hda-dma.c:882  ERROR hda_dma_link_check_xrun(): underrun detected
[     6582407.498855] (      174441.968750) c0 hda-dma            ..../intel/hda/hda-dma.c:882  ERROR hda_dma_link_check_xrun(): underrun detected
[     6584412.498776] (        2004.999878) c0 hda-dma            ..../intel/hda/hda-dma.c:882  ERROR hda_dma_link_check_xrun(): underrun detected
[     6758430.043944] (      174017.546875) c0 hda-dma            ..../intel/hda/hda-dma.c:882  ERROR hda_dma_link_check_xrun(): underrun detected
[     6760407.856366] (        1977.812378) c0 hda-dma            ..../intel/hda/hda-dma.c:882  ERROR hda_dma_link_check_xrun(): underrun detected
[     8126642.958326] (     1366235.125000) c0 hda-dma            ..../intel/hda/hda-dma.c:882  ERROR hda_dma_link_check_xrun(): underrun detected
[     8128623.999914] (        1981.041626) c0 hda-dma            ..../intel/hda/hda-dma.c:882  ERROR hda_dma_link_check_xrun(): underrun detected
[     8297970.711935] (      169346.718750) c0 hda-dma            ..../intel/hda/hda-dma.c:882  ERROR hda_dma_link_check_xrun(): underrun detected

https://sof-ci.01.org/sofpr/PR4793/build10449/devicetest/?model=BYT_MB_NOCODEC&testcase=test-speaker
has "broken pipe" that I never saw before.

https://sof-ci.01.org/sofpr/PR4793/build1449/devicetest/?model=BDW_WSB_RT286&testcase=multiple-pipeline-capture

has as lot of

[      116658.224531] (       16492.500000) c0 host         1.1           src/audio/host.c:312  INFO host_copy_one_shot(): no bytes to copy
[      116831.193274] (         172.968750) c0 dai          1.9            src/audio/dai.c:964  WARN dai_copy(): Copy_bytes 0 + avail bytes 288 < period bytes 384, possible glitch
[      116889.839105] (          58.645832) c0 dai          1.9            src/audio/dai.c:972  WARN dai_copy(): nothing to copy
[      133325.932202] (       16436.093750) c0 host         1.1           src/audio/host.c:312  INFO host_copy_one_shot(): no bytes to copy
[      133498.640529] (         172.708328) c0 dai          1.9            src/audio/dai.c:972  WARN dai_copy(): nothing to copy
[      149991.556540] (       16492.916016) c0 host         1.1           src/audio/host.c:312  INFO host_copy_one_shot(): no bytes to copy
[      150164.525283] (         172.968750) c0 dai          1.9            src/audio/dai.c:964  WARN dai_copy(): Copy_bytes 0 + avail bytes 288 < period bytes 384, possible glitch
[      150223.171114] (          58.645832) c0 dai          1.9            src/audio/dai.c:972  WARN dai_copy(): nothing to copy
[      166659.680878] (       16436.509766) c0 host         1.1           src/audio/host.c:312  INFO host_copy_one_shot(): no bytes to copy

The underruns and the WARN dai_copy(): nothing to copy are also found in some PASSing/green tests I looked at, example: https://sof-ci.01.org/sofpr/PR4793/build10449/devicetest/?model=CML_RVP_SDW&testcase=multiple-pause-resume-5

ERROR messages in the sof-logger have never caused any test to fail, part of our "green failures" culture :-(

https://sof-ci.01.org/sofpr/PR4793/build10449/devicetest/?model=BDW_WSB_RT286&testcase=multiple-pause-resume-5 and https://sof-ci.01.org/sofpr/PR4793/build10449/devicetest/?model=BYT_MB_NOCODEC&testcase=multiple-pipeline-capture
also have
ERROR dtrace_add_event(): number of dropped logs = X right before the DSP goes to suspend every time.
The former is red (for a different cause) and the latter is green.

@marc-hb
Copy link
Collaborator

marc-hb commented Sep 23, 2021

but @marc-hb this reverts the revert.

Got it, sorry I don't do double-negations :-) I recommend "restore commit 9fadef7" then.

You cant reproduce the original problem without it

It wasn't clear this PR is only for testing for now, it's not a draft and there's no [DNM] or [TEST]

The original commit turned off two settings unrelated to each other. There was a rationale for turning off one but not for the other.

@marc-hb
Copy link
Collaborator

marc-hb commented Sep 23, 2021

but @marc-hb this reverts the revert.

Got it, sorry I don't do double-negations :-) I recommend "restore commit 9fadef7" then.

It's actually a triple negation: Revert Revert disable filtering

Looks like triple is enough to confuse even the best: right now the PR name is "Re-enable trace filtering" which I now think is why I got confused.

EDIT: it's actually a quadruple negation if you consider "filtering" as negative = less trace. So quadruple negation = more trace => more failures.

@ranj063
Copy link
Collaborator Author

ranj063 commented Sep 23, 2021

Looks like triple is enough to confuse even the best: right now the PR name is "Re-enable trace filtering" which I now think is why I got confused.

no I will rename this to "Fix HD-DMA stop sequence" later on. and remove the first patch. If needed, whoever needs that can add it back. It doesnt belong in this series.

@XiaoyunWu6666
Copy link
Contributor

FYI , run daily for CML platforms with kernel PR 3167 and sof PR 4793 again since @ranj063 has updated this PR.
Let's wait to see the result

@XiaoyunWu6666
Copy link
Contributor

XiaoyunWu6666 commented Sep 24, 2021

@ranj063 looks #4558 and #4779 does not appear with this PR along with Linux PR thesofproject/linux#3167
(just did not appear in one test , not guaranteed that the issues were gone : see inner testresult 6821 )

But I did see a bunch of underruns when multiple-pause-resume on three CML platforms: CML_RVP_SDW CML_SKU0983_SDW CML_SKU0955_HDA , causing the testcase to fail after all pause-resume finished.

Also see ipc error when PCM_FREE on CML_SKU0955_HDA , when multiple-pipeline-playback

@ranj063
Copy link
Collaborator Author

ranj063 commented Sep 28, 2021

closing this one for now. Will open a new one

@ranj063 ranj063 closed this Sep 28, 2021
@marc-hb
Copy link
Collaborator

marc-hb commented Sep 29, 2021

#4820 seems related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants